The BLARK concept and BLARK for Arabic
نویسندگان
چکیده
The EU project NEMLAR (Network for Euro-Mediterranean LAnguage Resources) on Arabic language resources carried out two surveys on the availability of Arabic LRs in the region, and on industrial requirements. The project also worked out a BLARK (Basic Language Resource Kit) for Arabic. In this paper we describe the further development of the BLARK concept made during the work on a BLARK for Arabic, as well as the results for Arabic. 1. The BLARK concept The BLARK defines, ideally in a language independent way, the minimal set of language resources to do any precompetitive language and speech technology research at all for a language. After the first BLARK article (Krauwer, 1998) in the ELRA Newsletter, the idea was taken up by the Dutch Language Union (DLU). Daelemans & Strik (2002) give an overview of the steps taken by DLU to define the contents of the BLARK for Dutch and to assign priorities. Unfortunately this document is only available in Dutch. A summary of the work of the DLU and the results of the Dutch BLARK exercise can be found in Binnenpoorte et al (2002). Later the ENABLER project also contributed to the definition of the BLARK concept. The starting point of the definition process in Binnenpoorte et al were 8 classes of applications, seen as being the most relevant application categories at that moment: computer assisted language learning, access control, speech input, speech output, dialogue systems, document production, information access and translation. For each of them it was established which modules would be needed to make them (e.g. morphological analysis, text to phoneme converter), and for each of these modules it was analyzed which language data (e.g. data sets, descriptions) they would require, as well as their relative importance. The results were put together in a large matrix, on the basis of which one can determine which components serve most applications, and which data are most needed for most applications, i.e. which elements should be part of the BLARK. NEMLAR took this as the point of departure. We distinguish the BLARK definition which is the general concepts governing the BLARK, and the BLARK specification which is its instantiation for a given language, here Arabic. For the general discussion of the BLARK concept we first discuss a few important issues: availability, quality, quantity and standards.
منابع مشابه
A BLARK extension for temporal annotation mining
The Basic Language Resource Kit (BLARK) proposed by Krauwer is designed for the creation of initial textual resources. There are a number of toolkits for the development of spoken language resources and systems, but tools for second level resources, that is, resources which are the result of processing primary level speech resources such as speech recordings. Typically, processing of this kind ...
متن کاملDiabase: Towards a Diachronic BLARK in Support of Historical Studies
We present our ongoing work on language technology-based e-science in the humanities, social sciences and education, with a focus on text-based research in the historical sciences. An important aspect of language technology is the research infrastructure known by the acronym BLARK (Basic LAnguage Resource Kit). A BLARK as normally presented in the literature arguably reflects a modern standard ...
متن کاملDutch HLT resources: from BLARK to priority lists
In this paper we report on a project about Dutch Human Language Technologies (HLT) resources. In this project we first defined a so-called BLARK (Basic LAnguage Resources Kit). Subsequently, a survey was carried out to make an inventory and evaluation of existing Dutch HLT resources. Based on the information collected in the survey, a priority list was drawn up of materials that need to be deve...
متن کاملThe First Parallel Multilingual Corpus of Persian: Toward a Persian BLARK
In this article, we have introduced the first parallel corpus of Persian with more than 10 other European languages. This article describes primary steps toward preparing a Basic Language Resources Kit (BLARK) for Persian. Up to now, we have proposed morphosyntactic specification of Persian based on EAGLE/MULTEXT guidelines and specific resources of MULTEXT-East. The article introduces Persian ...
متن کاملNEMLAR - An Arabic Language Resources Project
The NEMLAR project is a European Commission supported project with partners from the EU and from Arabic speaking countries in the Mediterranean region. The project aims at surveying the stat-of-the artof language resources and tools for Arabic in the region, at developing a BLARK definition for Arabic, and at starting development of language resources or updating of existing language resources....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006